432 Class 23

Thomas E. Love, Ph.D.

2025-04-08

Today’s Topic

Cox models for time-to-event data

  • Rossi’s study of recidivism in people released from Maryland state prisons
  • Reviewing what we’ve done so far using coxph()
  • Using cph() from rms to fit three different Cox models

Setup

knitr::opts_chunk$set(comment=NA)

library(janitor)
library(naniar)
library(here)
library(conflicted)
library(broom)
library(gt)
library(rms)
library(survival)
library(survminer)
library(easystats)
library(tidyverse)

theme_set(theme_bw())

The Rossi Data

This data set is originally from Rossi et al. (1980). The data pertain to 432 convicts who were released from Maryland state prisons in the 1970s and who were followed up for one year after release. Half the released convicts were assigned at random to an experimental treatment in which they were given financial aid; half did not receive aid. Details on Rossi data variables and descriptions are provided at this link.

  • We will use a subset of the available variables today.
  • Subject ID code (s_id) included in the rossi.csv file.

Variable Descriptions for rossi_432

Variable Description
s_id Subject ID code
week week of first arrest after release or censoring; all censored observations are censored at 52 weeks.
arrest 1 if arrested, 0 if not
fin financial aid: no or yes (main treatment of study)
age age in years at time of release
race two categories: black or other
wexp full-time work experience before incarceration: no or yes
prio number of convictions prior to current incarceration

Ingesting the data into rossi_432

The rossi.csv file on our 432 data page includes 432 observations on 63 variables, but we’ll look at just the 8 we listed on the last slide.

rossi_432 <- read_csv(here("c23/data/rossi.csv"), show_col_types = FALSE) |>
  select(s_id, week, arrest, fin, age, race, wexp, prio)  |>
  mutate(across(where(is.character), as_factor),
         s_id = as.character(s_id))

dim(rossi_432)
[1] 432   8
n_miss(rossi_432)
[1] 0

The rossi_432 tibble

rossi_432
# A tibble: 432 × 8
   s_id   week arrest fin     age race  wexp   prio
   <chr> <dbl>  <dbl> <fct> <dbl> <fct> <fct> <dbl>
 1 S-001    20      1 no       27 black no        3
 2 S-002    17      1 no       18 black no        8
 3 S-003    25      1 no       19 other yes      13
 4 S-004    52      0 yes      23 black yes       1
 5 S-005    52      0 no       19 other yes       3
 6 S-006    52      0 no       24 black yes       2
 7 S-007    23      1 no       25 black yes       0
 8 S-008    52      0 yes      21 black yes       4
 9 S-009    52      0 no       22 black no        6
10 S-010    52      0 no       20 black yes       0
# ℹ 422 more rows

Create a survival object

rossi_432$S <- with(rossi_432, Surv(week, arrest == 1))

head(rossi_432$S)
[1] 20  17  25  52+ 52+ 52+

All subjects were followed either for 52 weeks after their release, or until they were (re-)arrested. So all “censored” subjects will have 52 weeks without re-arrest.

Compare the two financial aid groups

km_fin <- survfit(S ~ fin, data = rossi_432)

km_fin
Call: survfit(formula = S ~ fin, data = rossi_432)

          n events median 0.95LCL 0.95UCL
fin=no  216     66     NA      NA      NA
fin=yes 216     48     NA      NA      NA

Plot the K-M curves

ggsurvplot(km_fin, data = rossi_432, 
           palette = "aaas", ggtheme = theme_bw(),
           pval = TRUE)

Log - Log Plot for K-M estimation

  • Do these curves cross?
plot(survfit(S ~ fin, data = rossi_432), col = c(1:2), fun = "cloglog")

Cumulative Hazard Plot

ggsurvplot(km_fin, data = rossi_432, fun = "cumhaz",
           xlab = "Weeks of Follow-Up", palette = "aaas",
           pval = TRUE, break.time.by = 13,
           risk.table = TRUE, risk.table.height = 0.25)

Cumulative Hazard Plot

Cox Model using fin only

fit1 <- coxph(S ~ fin, data = rossi_432)

fit1
Call:
coxph(formula = S ~ fin, data = rossi_432)

          coef exp(coef) se(coef)      z      p
finyes -0.3691    0.6914   0.1897 -1.945 0.0517

Likelihood ratio test=3.84  on 1 df, p=0.05013
n= 432, number of events= 114 
  • How do we interpret the hazard ratio estimate of 0.6914?

fit1 model parameters

model_parameters(fit1, pretty_names = FALSE, exponentiate = TRUE,
                 ci = 0.90, digits = 3)
Parameter | Coefficient |    SE |         90% CI |      z |     p
-----------------------------------------------------------------
finyes    |       0.691 | 0.131 | [0.506, 0.945] | -1.945 | 0.052

Model fit1 summaries

model_performance(fit1)
# Indices of model performance

AIC      |     AICc |      BIC | Nagelkerke's R2 |  RMSE | Sigma
----------------------------------------------------------------
1348.924 | 1348.934 | 1352.993 |           0.009 | 0.513 | 0.000
glance(fit1) |> gt() |> fmt_number(decimals = 3)
n nevent statistic.log p.value.log statistic.sc p.value.sc statistic.wald p.value.wald statistic.robust p.value.robust r.squared r.squared.max concordance std.error.concordance logLik AIC BIC nobs
432.000 114.000 3.837 0.050 3.827 0.050 3.780 0.052 NA NA 0.009 0.956 0.546 0.023 −673.462 1,348.924 1,351.660 114.000

Forest Plot for fit1

ggforest(fit1)

Checking PH Assumption

cox.zph(fit1)
        chisq df    p
fin    0.0301  1 0.86
GLOBAL 0.0301  1 0.86

Checking PH Assumption

ggcoxzph(cox.zph(fit1))

A larger Cox model (fit2)

fit2 <- coxph(S ~ fin + age + race + wexp + prio, data = rossi_432)

fit2
Call:
coxph(formula = S ~ fin + age + race + wexp + prio, data = rossi_432)

              coef exp(coef) se(coef)      z       p
finyes    -0.36572   0.69369  0.19074 -1.917 0.05518
age       -0.05991   0.94185  0.02199 -2.724 0.00645
raceother -0.34582   0.70764  0.30736 -1.125 0.26053
wexpyes   -0.20788   0.81230  0.20905 -0.994 0.32003
prio       0.09212   1.09649  0.02831  3.254 0.00114

Likelihood ratio test=31.58  on 5 df, p=7.211e-06
n= 432, number of events= 114 

fit2 model parameters

model_parameters(fit2, pretty_names = FALSE, exponentiate = TRUE,
                 ci = 0.90, digits = 3)
Parameter | Coefficient |    SE |         90% CI |      z |     p
-----------------------------------------------------------------
finyes    |       0.694 | 0.132 | [0.507, 0.949] | -1.917 | 0.055
age       |       0.942 | 0.021 | [0.908, 0.977] | -2.724 | 0.006
raceother |       0.708 | 0.217 | [0.427, 1.173] | -1.125 | 0.261
wexpyes   |       0.812 | 0.170 | [0.576, 1.146] | -0.994 | 0.320
prio      |       1.096 | 0.031 | [1.047, 1.149] |  3.254 | 0.001
  • How do we interpret the fin hazard ratio estimate of 0.694?
  • How about the prio hazard ratio estimate of 1.096?

Model fit2 summaries

model_performance(fit2)
# Indices of model performance

AIC      |     AICc |      BIC | Nagelkerke's R2 |  RMSE | Sigma
----------------------------------------------------------------
1329.186 | 1329.327 | 1349.528 |           0.074 | 0.514 | 0.000
glance(fit2) |> gt() |> fmt_number(decimals = 3)
n nevent statistic.log p.value.log statistic.sc p.value.sc statistic.wald p.value.wald statistic.robust p.value.robust r.squared r.squared.max concordance std.error.concordance logLik AIC BIC nobs
432.000 114.000 31.575 0.000 31.994 0.000 30.720 0.000 NA NA 0.070 0.956 0.639 0.027 −659.593 1,329.186 1,342.867 114.000

Checking PH Assumption

cox.zph(fit2)
         chisq df      p
fin     0.0599  1 0.8066
age     6.0792  1 0.0137
race    2.0365  1 0.1536
wexp    4.1107  1 0.0426
prio    0.5348  1 0.4646
GLOBAL 17.0813  5 0.0043

Checking PH Assumption

ggcoxzph(cox.zph(fit2))

Compare our two models

plot(compare_performance(fit1, fit2))

Using cph() from the rms package

dd <- datadist(rossi_432); options(datadist = "dd")

fit1_cph <- cph(S ~ fin, data = rossi_432,
                x = TRUE, y = TRUE, surv = TRUE)

fit2_cph <- cph(S ~ fin + age + race + wexp + prio, data = rossi_432,
                x = TRUE, y = TRUE, surv = TRUE)

fit1_cph results

fit1_cph
Cox Proportional Hazards Model

cph(formula = S ~ fin, data = rossi_432, x = TRUE, y = TRUE, 
    surv = TRUE)

                        Model Tests    Discrimination    
                                              Indexes    
Obs        432    LR chi2      3.84    R2       0.009    
Events     114    d.f.            1    R2(1,432)0.007    
Center -0.1845    Pr(> chi2) 0.0501    R2(1,114)0.025    
                  Score chi2   3.83    Dxy      0.091    
                  Pr(> chi2) 0.0504                      

        Coef    S.E.   Wald Z Pr(>|Z|)
fin=yes -0.3691 0.1897 -1.95  0.0517  

fit1_cph Effects Plot

plot(summary(fit1_cph))
summary(fit1_cph)
             Effects              Response : S 

 Factor        Low High Diff. Effect   S.E.    Lower 0.95 Upper 0.95
 fin - yes:no  1   2    NA    -0.36907 0.18972 -0.74092   0.0027804 
  Hazard Ratio 1   2    NA     0.69138      NA  0.47668   1.0028000 

Proportional Hazards Assumption?

cox.zph(fit1_cph, transform = "km", global = TRUE)
        chisq df    p
fin    0.0301  1 0.86
GLOBAL 0.0301  1 0.86

Proportional Hazards Assumption?

ggcoxzph(cox.zph(fit1_cph, transform = "km", global = TRUE))

Bootstrap Validated fit1_cph summaries

set.seed(432)
validate(fit1_cph, B = 300)
      index.orig training   test optimism index.corrected   n
Dxy       0.0915   0.0889 0.0872   0.0017          0.0898 300
R2        0.0092   0.0108 0.0092   0.0016          0.0077 300
Slope     1.0000   1.0000 1.8983  -0.8983          1.8983 300
D         0.0021   0.0026 0.0021   0.0005          0.0016 300
U        -0.0015  -0.0015 0.0007  -0.0022          0.0007 300
Q         0.0036   0.0041 0.0014   0.0027          0.0009 300
g         0.1850   0.1804 0.1850  -0.0046          0.1896 300
  • Validated C statistic = 0.5 + (0.0898/2) = 0.5449
  • Validated Nagelkerke \(R^2\) = 0.0077

survplot() from rms fit1_cph

survplot(fit1_cph, fin = c("yes", "no"),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

fit1_cph Prediction Plot (1/3)

ggplot(Predict(fit1_cph))

fit1_cph Prediction Plot (2/3)

ggplot(Predict(fit1_cph, fun = exp))

fit1_cph Prediction Plot (3/3)

ggplot(Predict(fit1_cph, fun = plogis))

fit1_cph Nomogram

sv <- Survival(fit1_cph)
surv52 <- function(x) sv(52, lp = x)

plot(nomogram(fit1_cph, fun = surv52, funlabel = c("52 week survival")))

fit1_cph Nomogram

fit2_cph results

fit2_cph
Cox Proportional Hazards Model

cph(formula = S ~ fin + age + race + wexp + prio, data = rossi_432, 
    x = TRUE, y = TRUE, surv = TRUE)

                        Model Tests    Discrimination    
                                              Indexes    
Obs        432    LR chi2     31.58    R2       0.074    
Events     114    d.f.            5    R2(5,432)0.060    
Center -1.5427    Pr(> chi2) 0.0000    R2(5,114)0.208    
                  Score chi2  31.99    Dxy      0.277    
                  Pr(> chi2) 0.0000                      

           Coef    S.E.   Wald Z Pr(>|Z|)
fin=yes    -0.3657 0.1907 -1.92  0.0552  
age        -0.0599 0.0220 -2.72  0.0065  
race=other -0.3458 0.3074 -1.13  0.2605  
wexp=yes   -0.2079 0.2091 -0.99  0.3200  
prio        0.0921 0.0283  3.26  0.0011  

fit2_cph Effects Plot

plot(summary(fit2_cph))

fit2_cph Effects Table

summary(fit2_cph)
             Effects              Response : S 

 Factor             Low High Diff. Effect   S.E.     Lower 0.95 Upper 0.95
 age                20  27    7    -0.41932 0.153950 -0.72105   -0.1175800
  Hazard Ratio      20  27    7     0.65750       NA  0.48624    0.8890700
 prio                1   4    3     0.27642 0.084911  0.10999    0.4428400
  Hazard Ratio       1   4    3     1.31840       NA  1.11630    1.5571000
 fin - yes:no        1   2   NA    -0.36571 0.190740 -0.73955    0.0081226
  Hazard Ratio       1   2   NA     0.69370       NA  0.47733    1.0082000
 race - other:black  1   2   NA    -0.34585 0.307360 -0.94825    0.2565600
  Hazard Ratio       1   2   NA     0.70762       NA  0.38742    1.2925000
 wexp - no:yes       2   1   NA     0.20790 0.209060 -0.20184    0.6176400
  Hazard Ratio       2   1   NA     1.23110       NA  0.81723    1.8546000

survplot() from rms fit2_cph

  • Looking at the fin effect
survplot(fit2_cph, fin = c("no", "yes"),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

survplot() from rms fit2_cph

  • Looking at the race effect
survplot(fit2_cph, race = c("black", "other"),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

survplot() from rms fit2_cph

  • Looking at the age effect
survplot(fit2_cph, age = c(15, 25, 35, 45),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

survplot() from rms fit2_cph

  • Looking at the wexp effect
survplot(fit2_cph, wexp = c("no", "yes"),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

survplot() from rms fit2_cph

  • Looking at the prio effect
survplot(fit2_cph, prio = c(0, 2, 8, 16),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

fit2_cph Prediction Plots (1/3)

ggplot(Predict(fit2_cph))

fit2_cph Prediction Plots (2/3)

ggplot(Predict(fit2_cph, fun = exp))

fit2_cph Prediction Plots (3/3)

ggplot(Predict(fit2_cph, fun = plogis))

Proportional Hazards Assumption?

cox.zph(fit2_cph, transform = "km", global = TRUE)
         chisq df      p
fin     0.0599  1 0.8066
age     6.0790  1 0.0137
race    2.0366  1 0.1536
wexp    4.1106  1 0.0426
prio    0.5346  1 0.4647
GLOBAL 17.0806  5 0.0043

Proportional Hazards Assumption?

ggcoxzph(cox.zph(fit2_cph, transform = "km", global = TRUE))

Bootstrap Validated fit2_cph summaries

set.seed(432)
validate(fit2_cph, B = 300)
      index.orig training   test optimism index.corrected   n
Dxy       0.2774   0.3002 0.2657   0.0345          0.2429 300
R2        0.0737   0.0851 0.0652   0.0199          0.0538 300
Slope     1.0000   1.0000 0.8892   0.1108          0.8892 300
D         0.0226   0.0265 0.0199   0.0067          0.0159 300
U        -0.0015  -0.0015 0.0012  -0.0027          0.0012 300
Q         0.0241   0.0280 0.0187   0.0094          0.0147 300
g         0.6301   0.6808 0.5889   0.0919          0.5382 300
  • Validated C statistic = 0.5 + (0.2429/2) = 0.62145
  • Validated Nagelkerke \(R^2\) = 0.0538

Compare Performance

plot(compare_performance(fit1_cph, fit2_cph))

fit2_cph Nomogram

sv <- Survival(fit2_cph)
surv52 <- function(x) sv(52, lp = x)

plot(nomogram(fit2_cph, fun = surv52, funlabel = c("52 week survival")))

fit2_cph Nomogram

Add Non-Linear Terms?

  • Notice use of week rather than S here…
plot(spearman2(week ~ fin + age + race + wexp + prio, data = rossi_432))

Model fit3_cph

  • We’ll include a four-knot restricted cubic spline in age, and the interaction between prio and wexp.
fit3_cph <- cph(S ~ rcs(age, 4) + prio * wexp + fin + race, 
                data = rossi_432, x = T, y = T, surv = T)

fit3_cph

fit3_cph
Cox Proportional Hazards Model

cph(formula = S ~ rcs(age, 4) + prio * wexp + fin + race, data = rossi_432, 
    x = T, y = T, surv = T)

                        Model Tests    Discrimination    
                                              Indexes    
Obs        432    LR chi2     35.99    R2       0.084    
Events     114    d.f.            8    R2(8,432)0.063    
Center -5.4645    Pr(> chi2) 0.0000    R2(8,114)0.218    
                  Score chi2  41.91    Dxy      0.298    
                  Pr(> chi2) 0.0000                      

                Coef    S.E.   Wald Z Pr(>|Z|)
age             -0.2617 0.1074 -2.44  0.0148  
age'             1.7834 1.2207  1.46  0.1440  
age''           -3.1846 2.3664 -1.35  0.1784  
prio             0.0985 0.0331  2.98  0.0029  
wexp=yes        -0.1092 0.3097 -0.35  0.7245  
fin=yes         -0.3441 0.1920 -1.79  0.0730  
race=other      -0.3828 0.3105 -1.23  0.2176  
prio * wexp=yes -0.0100 0.0663 -0.15  0.8806  

fit3_cph Effects Plot

plot(summary(fit3_cph))

fit3_cph Effects Table

summary(fit3_cph)
             Effects              Response : S 

 Factor             Low High Diff. Effect   S.E.    Lower 0.95 Upper 0.95
 age                20  27    7    -0.45052 0.27623 -0.991920  0.090883  
  Hazard Ratio      20  27    7     0.63730      NA  0.370870  1.095100  
 prio                1   4    3     0.26574 0.17154 -0.070485  0.601960  
  Hazard Ratio       1   4    3     1.30440      NA  0.931940  1.825700  
 wexp - no:yes       2   1   NA     0.12909 0.23346 -0.328480  0.586660  
  Hazard Ratio       2   1   NA     1.13780      NA  0.720010  1.798000  
 fin - yes:no        1   2   NA    -0.34414 0.19198 -0.720420  0.032138  
  Hazard Ratio       1   2   NA     0.70883      NA  0.486550  1.032700  
 race - other:black  1   2   NA    -0.38281 0.31051 -0.991400  0.225770  
  Hazard Ratio       1   2   NA     0.68194      NA  0.371060  1.253300  

Adjusted to: prio=2 wexp=yes  

survplot() from rms fit3_cph

  • Looking at the fin effect
survplot(fit3_cph, fin = c("no", "yes"),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

survplot() from rms fit3_cph

  • Looking at the age effect
survplot(fit3_cph, age = c(15, 25, 35, 45),
         time.inc = 13, type = "kaplan-meier",
         xlab = "Study Time in Weeks")

fit3_cph Prediction Plots (raw)

ggplot(Predict(fit3_cph))

fit3_cph Prediction Plots (plogis)

ggplot(Predict(fit3_cph, fun = plogis))

Proportional Hazards Assumption?

ggcoxzph(cox.zph(fit3_cph, transform = "km", global = TRUE))

Bootstrap Validated fit3_cph summaries

set.seed(432)
validate(fit3_cph, B = 300)
      index.orig training   test optimism index.corrected   n
Dxy       0.2981   0.3269 0.2706   0.0564          0.2417 300
R2        0.0836   0.1016 0.0690   0.0326          0.0510 300
Slope     1.0000   1.0000 0.8319   0.1681          0.8319 300
D         0.0259   0.0321 0.0211   0.0110          0.0149 300
U        -0.0015  -0.0015 0.0019  -0.0034          0.0019 300
Q         0.0274   0.0336 0.0192   0.0144          0.0130 300
g         0.6038   0.6922 0.5651   0.1271          0.4766 300
  • Validated C statistic = 0.5 + (0.2417/2) = 0.62085
  • Validated Nagelkerke \(R^2\) = 0.0510

Compare Performance

plot(compare_performance(fit1_cph, fit2_cph, fit3_cph))

fit3_cph Nomogram

sv <- Survival(fit3_cph)
surv52 <- function(x) sv(52, lp = x)

plot(nomogram(fit3_cph, fun = surv52, funlabel = c("52 week survival")))

fit3_cph Nomogram